Detecting Latent User Properties in Social Media
نویسندگان
چکیده
The ability to identify user attributes such as gender, age, regional origin, and political orientation solely from user language in social media such as Twitter or similar highly informal content has important applications in advertising, personalization, and recommendation. This paper includes a novel investigation of stacked-SVM-based classification algorithms over a rich set of original features, applied to classifying these four user attributes. We propose new sociolinguisticsbased features for classifying user attributes in Twitter-style informal written genres, as distinct from the other primarily spoken genres previously studied in the user-property classification literature. Our models, singly and in ensemble, significantly outperform baseline models in all cases.
منابع مشابه
Inferring Latent User Properties from Texts Published in Social Media
We demonstrate an approach to predict latent personal attributes including user demographics, online personality, emotions and sentiments from texts published on Twitter. We rely on machine learning and natural language processing techniques to learn models from user communications. We first examine individual tweets to detect emotions and opinions emanating from them, and then analyze all the ...
متن کاملSocial Media Predictive Analytics
The recent explosion of social media services like Twitter, Google+ and Facebook has led to an interest in social media predictive analytics – automatically inferring hidden information from the large amounts of freely available content. It has a number of applications, including: online targeted advertising, personalized marketing, large-scale passive polling and real-time live polling, person...
متن کاملHierarchical Bayesian Models for Latent Attribute Detection in Social Media
We present several novel minimally-supervised models for detecting latent attributes of social media users, with a focus on ethnicity and gender. Previous work on ethnicity detection has used coarse-grained widely separated classes of ethnicity and assumed the existence of large amounts of training data such as the US census, simplifying the problem. Instead, we examine content generated by use...
متن کاملSimilarity measurement for describe user images in social media
Online social networks like Instagram are places for communication. Also, these media produce rich metadata which are useful for further analysis in many fields including health and cognitive science. Many researchers are using these metadata like hashtags, images, etc. to detect patterns of user activities. However, there are several serious ambiguities like how much reliable are these informa...
متن کاملTexts and Social Users Using Time Series and Latent Topics
Knowledge discovery has received tremendous interests and fast developments in both text mining and social user mining. The main purpose is to search massive volumes of data for patterns as so-called knowledge. Knowledge can exist in different formats such as texts or numbers. Knowledge can be observed or hidden in different hierarchies. Knowledge can even be user-generated such as social conte...
متن کامل